Path: blob/master/Part 3 - Classification/Decision Tree/[Python] Decision Tree.ipynb
1009 views
Kernel: Python 3
Decision Tree
Data preprocessing
In [1]:
In [2]:
In [3]:
Out[3]:
In [4]:
In [5]:
Out[5]:
array([[ 27, 57000],
[ 46, 28000],
[ 39, 134000],
[ 44, 39000],
[ 57, 26000],
[ 32, 120000],
[ 41, 52000],
[ 48, 74000],
[ 26, 86000],
[ 22, 81000]])
In [6]:
Out[6]:
array([[ 46, 22000],
[ 59, 88000],
[ 28, 44000],
[ 48, 96000],
[ 29, 28000],
[ 30, 62000],
[ 47, 107000],
[ 29, 83000],
[ 40, 75000],
[ 42, 65000]])
In [7]:
Out[7]:
array([0, 1, 1, 0, 1, 1, 0, 1, 0, 0])
In [8]:
Out[8]:
array([0, 1, 0, 1, 0, 0, 1, 0, 0, 0])
In [9]:
Out[9]:
/home/baka/Programs/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
warnings.warn(msg, DataConversionWarning)
In [10]:
Out[10]:
array([[-1.06675246, -0.38634438],
[ 0.79753468, -1.22993871],
[ 0.11069205, 1.853544 ],
[ 0.60129393, -0.90995465],
[ 1.87685881, -1.28811763],
[-0.57615058, 1.44629156],
[ 0.3069328 , -0.53179168],
[ 0.99377543, 0.10817643],
[-1.16487283, 0.45724994],
[-1.55735433, 0.31180264]])
In [11]:
Out[11]:
array([[ 0.79753468, -1.40447546],
[ 2.07309956, 0.51542886],
[-0.96863208, -0.76450736],
[ 0.99377543, 0.74814454],
[-0.87051171, -1.22993871],
[-0.77239133, -0.24089709],
[ 0.89565505, 1.06812859],
[-0.87051171, 0.36998156],
[ 0.20881242, 0.13726589],
[ 0.40505317, -0.15362871]])
Fitting Decision Tree Classifier to the Training set
In [12]:
Out[12]:
DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=42,
splitter='best')
Predicting the Test set results
In [13]:
In [14]:
Out[14]:
array([1, 1, 0, 0, 0, 0, 1, 0, 0, 0])
In [15]:
Out[15]:
array([0, 1, 0, 1, 0, 0, 1, 0, 0, 0])
Making the Confusion Matrix
In [16]:
Out[16]:
array([[46, 6],
[ 7, 21]])
classifier made 46 + 21 = 67 correct prediction and 7 + 6 = 13 incoreect predictions.
Visualising the Training set results
In [17]:
Out[17]:
Visualising the Test set results
In [18]:
Out[18]:
Things to remmember while making decison tree classifier:
Normally it overfits the data. As you can see in above training set, it tries to catch all the red dots which is in the green region, if we look carefully.
There is no need to Scale the features as decision tree does not depends on Euclidean distance. We are using Feature Scaling here just to get a plot with better resolution. For example if in above case you ommit scaling then while ploting you will get MemoryError.